Skip to content

Introduce a process-wide singleton engine for .collect(engine="gpu")#22410

Merged
rapids-bot[bot] merged 22 commits into
rapidsai:mainfrom
madsbk:default_gpu_engine
May 12, 2026
Merged

Introduce a process-wide singleton engine for .collect(engine="gpu")#22410
rapids-bot[bot] merged 22 commits into
rapidsai:mainfrom
madsbk:default_gpu_engine

Conversation

@madsbk
Copy link
Copy Markdown
Member

@madsbk madsbk commented May 7, 2026

lf.collect(engine="gpu") and pl.GPUEngine(executor="streaming") using the default cluster now route through a new process-wide DefaultSingletonEngine instead of constructing a fresh rapidsmpf Context, RMM adaptor, and Python executor for every query. Bootstrap now happens once per process rather than once per query.

DefaultSingletonEngine is a process-wide single-GPU singleton specialization of SPMDEngine: at most one live instance exists per process, it always uses a single-rank communicator plus default environment-derived settings, and repeated calls reuse the same engine instance until explicit shutdown.

The default cluster enum value is renamed from Cluster.SINGLE to Cluster.DEFAULT_SINGLETON so the dispatch token better reflects the actual behavior.

This PR also removes the dead inline-context fallback in evaluate_pipeline, which was the original "single" execution path.

@madsbk madsbk self-assigned this May 7, 2026
@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 7, 2026
@github-actions github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels May 7, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 7, 2026
@madsbk madsbk force-pushed the default_gpu_engine branch 9 times, most recently from 7e6beeb to 0fb9fe8 Compare May 9, 2026 08:07
Because each call forks a new child, process-wide side-effects
(the ``_bind_done`` flag, CPU affinity, environment variables) never
leak between tests or back into the pytest process.
def _run_in_subprocess(target: Callable[[], None]) -> None:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR exposed some issues with the "fork" approach, so we now use "spawn" instead. Otherwise, the tests remain the same.

Comment thread python/cudf_polars/tests/experimental/test_dataframescan.py
@madsbk madsbk force-pushed the default_gpu_engine branch from 0fb9fe8 to e7fe81a Compare May 9, 2026 08:17
@madsbk madsbk force-pushed the default_gpu_engine branch from e7fe81a to 34498a9 Compare May 9, 2026 08:17
@madsbk madsbk added breaking Breaking change and removed non-breaking Non-breaking change labels May 9, 2026
@madsbk madsbk force-pushed the default_gpu_engine branch from 770331a to f2fa352 Compare May 9, 2026 12:28
@madsbk madsbk marked this pull request as ready for review May 9, 2026 13:59
@madsbk madsbk requested a review from a team as a code owner May 9, 2026 13:59
@madsbk madsbk requested a review from mroeschke May 9, 2026 13:59
@rapidsai rapidsai deleted a comment from copy-pr-bot Bot May 9, 2026
@madsbk madsbk requested a review from wence- May 11, 2026 14:57
Copy link
Copy Markdown
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. I'll try to get back to this later, but don't wait for me.

Comment thread python/cudf_polars/cudf_polars/experimental/rapidsmpf/frontend/ray.py Outdated
streaming runtime.
* ``Cluster.DASK`` : Multi-GPU execution via Dask workers and the rapidsmpf
streaming runtime.
* ``Cluster.DEFAULT_SINGLETON`` : Single-GPU execution via the DefaultSingletonEngine.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's confirm this is the name we want.

  1. if we ever change the default, then this name will become misleading
  2. How commonly understood is "singleton"?
  3. This name doesn't mention "single" GPU-only, though the docs do.

And given that this is the default... maybe we can get away with updating the call sites to cluster: Cluster | None? And if we encounter None then we set up the default singleton cluster, and so we don't even need an enum name for this thing?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are right, but I still prefer the more descriptive name. I like DEFAULT_SINGLETON and DefaultSingletonEngine because that is exactly what they are :)

If we ever decide to change the default implementation, I think we should change what DefaultSingletonEngine does internally rather than route the default path to an entirely different engine type.

To me, the important semantic is “process-wide implicit singleton default engine”, and the current name makes that very explicit.

Comment thread python/cudf_polars/tests/test_sink.py
@madsbk madsbk requested review from TomAugspurger and mroeschke May 12, 2026 05:52
Copy link
Copy Markdown
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny suggestions

madsbk and others added 2 commits May 12, 2026 02:05
…/default_singleton_engine.py

Co-authored-by: Lawrence Mitchell <wence@gmx.li>
@madsbk
Copy link
Copy Markdown
Member Author

madsbk commented May 12, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 57e27e7 into rapidsai:main May 12, 2026
89 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 12, 2026
@madsbk madsbk deleted the default_gpu_engine branch May 12, 2026 12:39
TomAugspurger pushed a commit to TomAugspurger/pygdf that referenced this pull request May 12, 2026
rapidsai#22410)

`lf.collect(engine="gpu")` and `pl.GPUEngine(executor="streaming")` using the default cluster now route through a new process-wide `DefaultSingletonEngine` instead of constructing a fresh rapidsmpf `Context`, RMM adaptor, and Python executor for every query. Bootstrap now happens once per process rather than once per query.

`DefaultSingletonEngine` is a process-wide single-GPU singleton specialization of `SPMDEngine`: at most one live instance exists per process, it always uses a single-rank communicator plus default environment-derived settings, and repeated calls reuse the same engine instance until explicit shutdown.

The default cluster enum value is renamed from `Cluster.SINGLE` to `Cluster.DEFAULT_SINGLETON` so the dispatch token better reflects the actual behavior.

This PR also removes the dead inline-context fallback in `evaluate_pipeline`, which was the original `"single"` execution path.

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#22410
shrshi pushed a commit to shrshi/cudf that referenced this pull request May 12, 2026
rapidsai#22410)

`lf.collect(engine="gpu")` and `pl.GPUEngine(executor="streaming")` using the default cluster now route through a new process-wide `DefaultSingletonEngine` instead of constructing a fresh rapidsmpf `Context`, RMM adaptor, and Python executor for every query. Bootstrap now happens once per process rather than once per query.

`DefaultSingletonEngine` is a process-wide single-GPU singleton specialization of `SPMDEngine`: at most one live instance exists per process, it always uses a single-rank communicator plus default environment-derived settings, and repeated calls reuse the same engine instance until explicit shutdown.

The default cluster enum value is renamed from `Cluster.SINGLE` to `Cluster.DEFAULT_SINGLETON` so the dispatch token better reflects the actual behavior. 

This PR also removes the dead inline-context fallback in `evaluate_pipeline`, which was the original `"single"` execution path.

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#22410
galipremsagar pushed a commit to galipremsagar/cudf that referenced this pull request May 13, 2026
rapidsai#22410)

`lf.collect(engine="gpu")` and `pl.GPUEngine(executor="streaming")` using the default cluster now route through a new process-wide `DefaultSingletonEngine` instead of constructing a fresh rapidsmpf `Context`, RMM adaptor, and Python executor for every query. Bootstrap now happens once per process rather than once per query.

`DefaultSingletonEngine` is a process-wide single-GPU singleton specialization of `SPMDEngine`: at most one live instance exists per process, it always uses a single-rank communicator plus default environment-derived settings, and repeated calls reuse the same engine instance until explicit shutdown.

The default cluster enum value is renamed from `Cluster.SINGLE` to `Cluster.DEFAULT_SINGLETON` so the dispatch token better reflects the actual behavior. 

This PR also removes the dead inline-context fallback in `evaluate_pipeline`, which was the original `"single"` execution path.

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#22410
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants